22 research outputs found
Binary Classification with Instance and Label Dependent Label Noise
Learning with label dependent label noise has been extensively explored in
both theory and practice; however, dealing with instance (i.e., feature) and
label dependent label noise continues to be a challenging task. The difficulty
arises from the fact that the noise rate varies for each instance, making it
challenging to estimate accurately. The question of whether it is possible to
learn a reliable model using only noisy samples remains unresolved. We answer
this question with a theoretical analysis that provides matching upper and
lower bounds. Surprisingly, our results show that, without any additional
assumptions, empirical risk minimization achieves the optimal excess risk
bound. Specifically, we derive a novel excess risk bound proportional to the
noise level, which holds in very general settings, by comparing the empirical
risk minimizers obtained from clean samples and noisy samples. Second, we show
that the minimax lower bound for the 0-1 loss is a constant proportional to the
average noise rate. Our findings suggest that learning solely with noisy
samples is impossible without access to clean samples or strong assumptions on
the distribution of the data
Generalization Bounds in the Predict-then-Optimize Framework
The predict-then-optimize framework is fundamental in many practical
settings: predict the unknown parameters of an optimization problem, and then
solve the problem using the predicted values of the parameters. A natural loss
function in this environment is to consider the cost of the decisions induced
by the predicted parameters, in contrast to the prediction error of the
parameters. This loss function was recently introduced in Elmachtoub and Grigas
(2017) and referred to as the Smart Predict-then-Optimize (SPO) loss. In this
work, we seek to provide bounds on how well the performance of a prediction
model fit on training data generalizes out-of-sample, in the context of the SPO
loss. Since the SPO loss is non-convex and non-Lipschitz, standard results for
deriving generalization bounds do not apply.
We first derive bounds based on the Natarajan dimension that, in the case of
a polyhedral feasible region, scale at most logarithmically in the number of
extreme points, but, in the case of a general convex feasible region, have
linear dependence on the decision dimension. By exploiting the structure of the
SPO loss function and a key property of the feasible region, which we denote as
the strength property, we can dramatically improve the dependence on the
decision and feature dimensions. Our approach and analysis rely on placing a
margin around problematic predictions that do not yield unique optimal
solutions, and then providing generalization bounds in the context of a
modified margin SPO loss function that is Lipschitz continuous. Finally, we
characterize the strength property and show that the modified SPO loss can be
computed efficiently for both strongly convex bodies and polytopes with an
explicit extreme point representation.Comment: Preliminary version in NeurIPS 201
New Analysis and Results for the Conditional Gradient Method
We present new results for the conditional gradient method (also known as the Frank-Wolfe method). We derive computational guarantees for arbitrary step-size sequences, which are then applied to various step-size rules, including simple averaging and constant step-sizes. We also develop step-size rules and computational guarantees that depend naturally on the warm-start quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the so-called Wolfe gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions.